US Gun Deaths Data

The dataset came from FiveThirtyEight, and can be found here https://github.com/fivethirtyeight/guns-data. The dataset is stored in the guns.csv file. It contains information on gun deaths in the US from 2012 to 2014. Each row in the dataset represents a single fatality. The columns contain demographic and other information about the victim. Here are the first few rows of the dataset


In [25]:
import csv
data = list(csv.reader(open('guns.csv', 'r')))
print(data[:5])


[['', 'year', 'month', 'intent', 'police', 'sex', 'age', 'race', 'hispanic', 'place', 'education'], ['1', '2012', '01', 'Suicide', '0', 'M', '34', 'Asian/Pacific Islander', '100', 'Home', '4'], ['2', '2012', '01', 'Suicide', '0', 'F', '21', 'White', '100', 'Street', '3'], ['3', '2012', '01', 'Suicide', '0', 'M', '60', 'White', '100', 'Other specified', '4'], ['4', '2012', '02', 'Suicide', '0', 'M', '64', 'White', '100', 'Home', '4']]

In [26]:
#removing header row
headers = data[:1]
data = data[1:]
print(data[:5])


[['1', '2012', '01', 'Suicide', '0', 'M', '34', 'Asian/Pacific Islander', '100', 'Home', '4'], ['2', '2012', '01', 'Suicide', '0', 'F', '21', 'White', '100', 'Street', '3'], ['3', '2012', '01', 'Suicide', '0', 'M', '60', 'White', '100', 'Other specified', '4'], ['4', '2012', '02', 'Suicide', '0', 'M', '64', 'White', '100', 'Home', '4'], ['5', '2012', '02', 'Suicide', '0', 'M', '31', 'White', '100', 'Other specified', '2']]

In [27]:
#count in the dictionary of how many times each element occurs in the year column

years = [each[1] for each in data]
years
year_counts = {}
for each in years:
    if each in year_counts:
        year_counts[each] += 1
    else:
        year_counts[each] = 1
print(year_counts)


{'2013': 33636, '2014': 33599, '2012': 33563}

In [28]:
#Let's see if gun deaths in the US change by month and year
import datetime
dates = [datetime.datetime(year=int(each[1]), month=int(each[2]), day=1) for each in data] 
date_counts = {}
for each in dates:
    if each in date_counts:
        date_counts[each] += 1
    else:
        date_counts[each] = 1
dates[:5]


Out[28]:
[datetime.datetime(2012, 1, 1, 0, 0),
 datetime.datetime(2012, 1, 1, 0, 0),
 datetime.datetime(2012, 1, 1, 0, 0),
 datetime.datetime(2012, 2, 1, 0, 0),
 datetime.datetime(2012, 2, 1, 0, 0)]

The sex and race columns contain potentially interesting information on how gun deaths in the US vary by gender and race. Exploring both of these columns can be done with a similar dictionary counting technique to what we did earlier.


In [29]:
sex_counts = {}
race_counts = {}

for each in data:
    sex = each[5]
    if sex in sex_counts:
        sex_counts[sex] += 1
    else:
        sex_counts[sex] = 1

for each in data:
    race = each[7]
    if race in race_counts:
        race_counts[race] += 1
    else:
        race_counts[race] = 1
print(race_counts)
print(sex_counts)


{'Black': 23296, 'Asian/Pacific Islander': 1326, 'Hispanic': 9022, 'White': 66237, 'Native American/Native Alaskan': 917}
{'M': 86349, 'F': 14449}

However, our analysis only gives us the total number of gun deaths by race in the US. Unless we know the proportion of each race in the US, we won't be able to meaningfully compare those numbers. I want to get is a rate of gun deaths per 100000 people of each race


In [30]:
f = open ('census.csv', 'r')
census = list(csv.reader(f))
census


Out[30]:
[['Id',
  'Year',
  'Id',
  'Sex',
  'Id',
  'Hispanic Origin',
  'Id',
  'Id2',
  'Geography',
  'Total',
  'Race Alone - White',
  'Race Alone - Hispanic',
  'Race Alone - Black or African American',
  'Race Alone - American Indian and Alaska Native',
  'Race Alone - Asian',
  'Race Alone - Native Hawaiian and Other Pacific Islander',
  'Two or More Races'],
 ['cen42010',
  'April 1, 2010 Census',
  'totsex',
  'Both Sexes',
  'tothisp',
  'Total',
  '0100000US',
  '',
  'United States',
  '308745538',
  '197318956',
  '44618105',
  '40250635',
  '3739506',
  '15159516',
  '674625',
  '6984195']]

In [31]:
mapping = {
    'Asian/Pacific Islander': int(census[1][14]) + int(census[1][15]),
    'Black': int(census[1][12]),
    'Native American/Native Alaskan': int(census[1][13]),
    'Hispanic': int(census[1][11]),
    'White': int(census[1][10])
}
race_per_hundredk = {}

for key, value in race_counts.items():
    result = race_counts[key] / mapping[key] * 100000
    race_per_hundredk[key] = result
race_per_hundredk


Out[31]:
{'Asian/Pacific Islander': 8.374309664161762,
 'Black': 57.8773477735196,
 'Hispanic': 20.220491210910907,
 'Native American/Native Alaskan': 24.521955573811088,
 'White': 33.56849303419181}

In [32]:
#We can filter our results, and restrict them to the Homicide intent

intents = [each[3] for each in data]
races = [each[7] for each in data]
homicide_race_counts = {}
for i, each in enumerate(races):
    if intents[i] == 'Homicide':
        if each not in homicide_race_counts:
            homicide_race_counts[each] = 0
        else:
            homicide_race_counts[each] += 1
homicide_race_counts


Out[32]:
{'Asian/Pacific Islander': 558,
 'Black': 19509,
 'Hispanic': 5633,
 'Native American/Native Alaskan': 325,
 'White': 9146}

In [33]:
homicide_race_per_hundredk = {}

for key, value in homicide_race_counts.items():
    result = homicide_race_counts[key] / mapping[key] * 100000
    homicide_race_per_hundredk[key] = result
homicide_race_per_hundredk


Out[33]:
{'Asian/Pacific Islander': 3.5240307636517825,
 'Black': 48.468800554326656,
 'Hispanic': 12.624919861567406,
 'Native American/Native Alaskan': 8.690987526159873,
 'White': 4.635135004464548}

Finding

I have founded out, that some racial categories in USA have higher gun-related homicide rate than other races. For example, at least as evidenced by the statics, that people of Black rice commit gun-related homicide 10 times more people of White race or 4 times more people of Hispanic race.

Let figure out that!


In [34]:
month_homicide_rate = {}
months = [int(each[2]) for each in data]
for i, each in enumerate(months):
    if intents[i] == 'Homicide':
        if each not in month_homicide_rate:
            month_homicide_rate[each] = 0
        else:
            month_homicide_rate[each] += 1
month_homicide_rate


Out[34]:
{1: 2828,
 2: 2177,
 3: 2779,
 4: 2844,
 5: 2975,
 6: 3129,
 7: 3268,
 8: 3124,
 9: 2965,
 10: 2967,
 11: 2918,
 12: 3190}

In [67]:
def months_diff(input_dict):
    max_value = 0
    max_key = 0
    min_value = input_dict[1]
    min_key = 0

    for key, value in input_dict.items():
        if value > max_value:
            max_value = value
            max_key = key
        if value < min_value:
            min_value = value
            min_key = key
    gap = round((max_value / min_value), 2)
    
    print ('max month is',max_key,'has',max_value,'and min month is',min_key,'has',min_value,'. The gap between min and max months is',gap,'!')

In [68]:
months_diff(month_homicide_rate)


max month is 7 has 3268 and min month is 2 has 2177 . The gap between min and max months is 1.5 !

As we can see, there is a link beetween month of year and homicide rate. In June are commited gun-relative homicide in 1